Search Result

Journals

Publication Years

Keywords

Please wait a minute...

For Selected:

Download Citations
EndNote Ris BibTeX

Toggle Thumbnails

Select

Pseudo relevance feedback method for dense retrieval

Wenhao HU, Jing LUO, Xinhui TU

Journal of Computer Applications 2023, 43 (4): 1036-1042. DOI: 10.11772/j.issn.1001-9081.2022030480

Abstract （292）

HTML （14）

PDF （1463KB）（116）

Save

Pseudo Relevance Feedback （PRF） mechanism is an automated Query Expansion （QE） technology that uses the original query and the information contained in the top N documents in the initial retrieval to build more accurate queries. It can further improve the performance of retrieval systems. However， the existing PRF methods for dense retrieval have two problems： lack of semantic information due to text truncation， and high time complexity in retrieval stages. Aiming at these problems， an PRF method based on paragraph-level granularity and can be used in dense retrieval for long texts， namely Dense-PRD， was proposed. Firstly， the embeddings of relevant paragraphs from top N documents of the initial retrieval were obtained by semantic distance calculation. Secondly， the QE term embeddings were obtained by average polling of the relevant paragraph embeddings. Thirdly， new query embeddings were constructed by combining the original query embeddings and QE term embeddings according to their weights. Finally， the final retrieval results were obtained according to new query embeddings. In experiments of comparing Dense-PRF with baseline models on two classic long text test datasets of Robust04 and WT2G， compared to model RepBERT+BM25， Dense-PRF has the accuracy and Normalized Discounted Cumulative Gain （NDCG） index of the top 20 documents improved by 1.66， 1.32 percentage points and 2.30， 1.91 percentage points. Experimental results demonstrate that Dense-PRF can effectively alleviate the mismatches between queries and document vocabularies and improve the retrieval accuracy.

Table and Figures | Reference | Related Articles | Metrics